We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and mapping (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD).
translated by 谷歌翻译
熵建模是高性能图像压缩算法的关键组件。自回旋上下文建模的最新发展有助于基于学习的方法超越了经典的方法。但是,由于潜在空间中的空间通道依赖性以及上下文适应性的次优实现,这些模型的性能可以进一步提高。受到变压器的自适应特性的启发,我们提出了一个基于变压器的上下文模型,名为ContextFormer,该模型将事实上的标准注意机制推广到时空通道的注意力。我们用上下文形式替换了现代压缩框架的上下文模型,并在广泛使用的柯达,Clic2020和Tecnick Image数据集上进行测试。我们的实验结果表明,与标准多功能视频编码(VVC)测试模型(VTM)16.2相比,提出的模型可节省多达11%的利率,并且在PSNR和MS-SSIM方面优于各种基于学习的模型。
translated by 谷歌翻译